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The work of Lee et al. is theoretically well founded and thoroughly moti- 
vated by practical data analysis. The algorithm presented has the following 
'^ '_ important properties: 

1. Hierarchical clustering using a novel, adaptive, eigenvector-related, ag- 
n I glomerative criterion. 

^ • 2. Principal components analysis carried out locally, leading to the required 

sample size for consistency being logarithmic rather than linear; and corn- 
ed ' putational time being quadratic rather than cubic. 

c/3 . 3. Multiresolution transform with interesting characteristics: data-adaptive 

at each node of the tree, orthonormal, and the tree decomposition itself 
is data-adaptive. 
4. Integration of all of the following: hierarchical clustering, dimensionality 
reduction, and multiresolution transform. 
f^ . 5. Range of data patterns explored, in particular, block patterns in the 

TJ" I covariances, and "model" or pattern contexts. 
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While I admire the work of the authors, nonetheless I have a different 
QQ ' point of view on key aspects of this work: 

1. The highest dimensionality analyzed seems to be 760 in the Internet 

advertisements case study. In fact, the quadratic computational time re- 

S^ . quirements (Section 2.1 of Lee et al.) preclude scalability. My approach 

in Murtagh (2007a) to wavelet transforming a dendrogram is of linear 
computational complexity (for both observations, and attributes) in the 
multiresolution transform. The hierarchical clustering, to begin with, is 
typically quadratic for the n observations, and linear in the p attributes. 
These computational requirements are necessary for the "small n, large 
p" problem which motivates this work (Section 1). In particular, linearity 
in p is a sine qua non for very high dimensionality data exploration. 
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Since L = 0{p) in Section 2.1, this cubic time requirement has to be 
alleviated, in practice, through limiting L to a user-specified value. 

2. The local principal components analysis (Section 2.1) inherently helps 
with data normalization, but it only goes some distance. For qualitative, 
mixed quantitative and qualitative, or other forms of messy data, I would 
use a correspondence analysis to furnish a Euclidean data embedding. 
This, then, can be the basis for classification or discrimination, benefiting 
from the Euclidean framework. See Murtagh (2005). 

3. My final point is in relation to the following (Section 1): "The key prop- 
erty that allows successful inference and prediction in high-dimensional 
settings is the notion of sparsity." I disagree, in that sparsity of course 
can be exploited, but what is far more rewarding is that high dimensions 
are of particular topology, and not just data morphology. 

This is shown in the work of Hah et al. (2005), Ahn et al. (2007), Donoho 
and Tanner (2005) and Breuel (2007), as well as Murtagh (2004). What 
this leads to, potentially, is the exploitation of the remarkable simplicity 
that is concomitant with very high dimensionality: Murtagh (2007b). 
Applications include text analysis, in many varied applications, and high 
frequency financial and other signal analysis. 

In conclusion, I thank the authors for their thought-provoking and moti- 
vating work. 
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